Todo el código del proyecto está disponible en https://github.com/Fiorellaps/PRA2_visualizacion
El conjunto de datos ha sido tomado de data.world y pertenecen a un artículo que analizó 4 millones de publicaciones de Facebook para observar que información había sobre Donald Trump y Hilary Clinton.
Por ello, el conjunto contiene publicaciones de Facebook escritos entre 2012 y 2016 por 15 de los principales medios de comunicación.
Los datos se encuentran agrupados (en csv) por medio de comunicación, por lo que para este caso solo he escogido 5 de esos medios para reducir los costes que puede tener analizar un conjunto de datos tan grande.
Por tanto, el dataset generado se trata de un conjunto de 135.544 filas, cada una de las cuales corresponde a una publicación de Facebook realizada por algunas de las siguientes páginas:
BBC (id 228735667216): The British Broadcasting Corporation.
CNN (id 5550296508): Multinational cable news channel headquartered in Atlanta, Georgia, U.S.
Fox (id 15704546335): The Fox News Channel.
ABC (id 86680728811): ABC News is the news division of the American broadcast network ABC.
LA Times (id 5863113009): Los Angeles Times: News from California, the nation and world.
| X | id | page_id | name | message | description | post_type | status_type | likes_count | comments_count | shares_count | love_count | wow_count | haha_count | sad_count | thankful_count | angry_count | posted_at | date | hour | page_name |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 228735667216_10151160265682217 | 228735667216 | BBC News Photos | Your best photographs of 2012. GALLERY: http://bbc.in/UvZxmS | not provided | photo | added_photos | 242 | 6 | 45 | 0 | 0 | 0 | 0 | 0 | 0 | 2012-12-30 | 2012-12-30 | 09:16:36 | bbc |
| 2 | 228735667216_10151161802472217 | 228735667216 | US congress in final push to reach ‘fiscal cliff’ deal | London’s leading shares have fallen amid fears that budget talks won’t stop the US sliding over the ‘fiscal cliff’. http://bbc.in/Uz888cAre you worried about the impact to the global economy, if US senators don’t reach a deal before New Year’s Day? http://bbc.in/Uz86x7 | link | published_story | 39 | 56 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2012-12-31 | 2012-12-31 | 14:16:43 | bbc |
| X | id | page_id | name | message | description | post_type | status_type | likes_count | comments_count | shares_count | love_count | wow_count | haha_count | sad_count | thankful_count | angry_count | posted_at | date | hour | page_name | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Min. : 1 | Length:135544 | Min. : 5550296508 | Length:135544 | Length:135544 | Length:135544 | Length:135544 | Length:135544 | Min. : 1 | Min. : 0.0 | Min. : 0 | Min. : 0.0 | Min. : 0.00 | Min. : 0.00 | Min. : 0 | Min. : 0.0000 | Min. : 0.00 | Length:135544 | Length:135544 | Length:135544 | Length:135544 | |
| 1st Qu.: 38929 | Class :character | 1st Qu.: 5863113009 | Class :character | Class :character | Class :character | Class :character | Class :character | 1st Qu.: 568 | 1st Qu.: 82.0 | 1st Qu.: 88 | 1st Qu.: 0.0 | 1st Qu.: 0.00 | 1st Qu.: 0.00 | 1st Qu.: 0 | 1st Qu.: 0.0000 | 1st Qu.: 0.00 | Class :character | Class :character | Class :character | Class :character | |
| Median : 75102 | Mode :character | Median : 15704546335 | Mode :character | Mode :character | Mode :character | Mode :character | Mode :character | Median : 1745 | Median : 226.0 | Median : 282 | Median : 0.0 | Median : 0.00 | Median : 0.00 | Median : 0 | Median : 0.0000 | Median : 0.00 | Mode :character | Mode :character | Mode :character | Mode :character | |
| Mean : 75797 | NA | Mean : 60085410520 | NA | NA | NA | NA | NA | Mean : 5663 | Mean : 795.6 | Mean : 1573 | Mean : 132.5 | Mean : 58.12 | Mean : 63.62 | Mean : 105 | Mean : 0.1321 | Mean : 90.87 | NA | NA | NA | NA | |
| 3rd Qu.:113203 | NA | 3rd Qu.: 86680728811 | NA | NA | NA | NA | NA | 3rd Qu.: 4678 | 3rd Qu.: 673.0 | 3rd Qu.: 972 | 3rd Qu.: 9.0 | 3rd Qu.: 12.00 | 3rd Qu.: 4.00 | 3rd Qu.: 3 | 3rd Qu.: 0.0000 | 3rd Qu.: 3.00 | NA | NA | NA | NA | |
| Max. :150983 | NA | Max. :228735667216 | NA | NA | NA | NA | NA | Max. :1155249 | Max. :770366.0 | Max. :1934157 | Max. :213998.0 | Max. :45708.00 | Max. :45122.00 | Max. :149098 | Max. :1541.0000 | Max. :117430.00 | NA | NA | NA | NA |
En esta página se muestra información general de los datos y gráficas relacionadas con el número de publicaciones según el tipo y la fecha de publicación.
135.544
5
767.605.124
106.978
11.264
7
5
17.290
107.834.610
En esta página han calculado el total de reacciones por tipo y página para ver cuáles son los patrones de los usuarios hacia esas páginas.
En esta página se ha realizado un análisis del texto de las publicaciones para comprender las relaciones entre las palabras y los sentimientos asociados.
---
title: "Visualización de datos de Facebook"
output:
flexdashboard::flex_dashboard:
orientation: rows
vertical_layout: fill
source_code: embed
---
```{r setup, include=FALSE}
# Dashboard
library(flexdashboard)
# Data manipulation
library(tidyverse) # data manipulation & plotting
library(stringr) # text cleaning and regular expressions
library(tidytext) # provides additional text mining functions
library(textdata)
library(dplyr)
# Plots
library(viridis)
library(wordcloud2)
#devtools::install_github("gaospecial/wordcloud2")
#library("gaospecial/wordcloud2")
library(RColorBrewer)
library(tm)
library(ggplot2)
library(emojifont)
library(plotly)
library(gridExtra)
# PieChart
library(lessR)
# Network
library(igraph)
library(ggraph)
library(ggiraph)
#devtools::install_github("dgrtwo/drlib")
```
```{r}
data <- read.csv("facebook_dataset.csv", header=TRUE, stringsAsFactors = FALSE)
data$page_name<-""
data[data$page_id=="86680728811",]$page_name<-"abc"
data[data$page_id=="228735667216",]$page_name<-"bbc"
data[data$page_id=="5550296508",]$page_name<-"cnn"
data[data$page_id=="15704546335",]$page_name<-"fox"
data[data$page_id=="5863113009",]$page_name<-"laTimes"
```
Dataset {data-icon="fa-table"}
=============================
Row
--------------------------------------
<div style="height:100px">
### Código
Todo el código del proyecto está disponible en https://github.com/Fiorellaps/PRA2_visualizacion
</div>
Row
--------------------------------------
<div style="height:350px">
### Descripción del dataset
El conjunto de datos ha sido tomado de [data.world](https://data.world/martinchek/2012-2016-facebook-posts) y pertenecen a un [artículo](https://shift.newco.co/2016/11/09/What-I-Discovered-About-Trump-and-Clinton-From-Analyzing-4-Million-Facebook-Posts/) que analizó 4 millones de publicaciones de Facebook para observar que información había sobre Donald Trump y Hilary Clinton.
Por ello, el conjunto contiene publicaciones de Facebook escritos entre 2012 y 2016 por 15 de los principales medios de comunicación.
Los datos se encuentran agrupados (en csv) por medio de comunicación, por lo que para este caso solo he escogido 5 de esos medios para reducir los costes que puede tener analizar un conjunto de datos tan grande.
Por tanto, el dataset generado se trata de un conjunto de 135.544 filas, cada una de las cuales corresponde a una **publicación de Facebook** realizada por algunas de las siguientes páginas:
- **BBC** (id 228735667216): The British Broadcasting Corporation.
- **CNN** (id 5550296508): Multinational cable news channel headquartered in Atlanta, Georgia,
U.S.
- **Fox** (id 15704546335): The Fox News Channel.
- **ABC** (id 86680728811): ABC News is the news division of the American broadcast network ABC.
- **LA Times** (id 5863113009): Los Angeles Times: News from California, the nation and world.
</div>
Row
--------------------------------------
<div style="height:600px">
### Previsualización de los datos
```{r}
knitr::kable(head(data, n=2))
```
</div>
Row
--------------------------------------
<div style="height:800px">
### Resumen de los campos
```{r}
knitr::kable(summary(data))
```
</div>
Estadística {data-icon="fa-chart-line"}
=============================
Row
--------------------------------------
<div style="height:100px">
### Descripción
En esta página se muestra **información general de los datos** y gráficas relacionadas con el número de publicaciones según el **tipo y la fecha de publicación**.
</div>
Row {data-width=150}
--------------------------------------
### Número de Publicaciones
```{r}
total_post <- nrow(data)
total_post <- paste0(substr(total_post, 1,3), ".", substr(total_post, 4,6 ))
valueBox(value = total_post,icon = "fa-facebook",caption = "Número de Publicaciones",color = "#C1FFC1")
```
### Número de páginas
```{r}
total_pages <- length(unique(data$page_id))
valueBox(value = total_pages,icon = "fa-file",caption = "Número de páginas", color = "#FFD700")
```
### Número de likes
```{r}
likes_total <- sum(data$likes_count)
likes_total <- paste0(substr(likes_total, 1,3), ".", substr(likes_total, 4,6 ),".", substr(likes_total, 7,9 ) )
valueBox(value = likes_total,icon = "fa-thumbs-up",caption = "Número de likes", color = "#FFEC8B")
```
Row {data-width=150}
--------------------------------------
### Número de enlaces
```{r}
link_post <- nrow(data[data$post_type == "link",])
link_post <- paste0(substr(link_post, 1,3), ".", substr(link_post, 4,6 ))
valueBox(value = link_post,icon = "fa-link",caption = "Número de enlaces",color = "#EEEEE0")
```
### Número de vídeos
```{r}
video_post <- nrow(data[data$post_type == "video",])
video_post <- paste0(substr(video_post, 1,2), ".", substr(video_post, 3,5 ))
valueBox(value = video_post,icon = "fa-video",caption = "Número de vídeos", color = "#BF3EFF")
```
### Número de canciones
```{r}
music_post <- nrow(data[data$post_type == "music",])
valueBox(value = music_post,icon = "fa-music",caption = "Número de canciones", color = "#7FFFD4")
```
Row {data-width=150}
--------------------------------------
### Número de eventos
```{r}
event_post <- nrow(data[data$post_type == "event",])
valueBox(value = event_post,icon = "fa-calendar",caption = "Número de eventos",color = "#FFA07A")
```
### Número de fotos
```{r}
photo_post <- nrow(data[data$post_type == "photo",])
photo_post <- paste0(substr(photo_post, 1,2), ".", substr(photo_post, 3,5 ))
valueBox(value = photo_post,icon = "fa-image",caption = "Número de fotos", color = "#FF3030")
```
### Número de comentarios
```{r}
comments_total <- sum(data$comments_count)
comments_total <- paste0(substr(comments_total, 1,3), ".", substr(comments_total, 4,6 ),".", substr(comments_total, 7,9 ) )
valueBox(value = comments_total,icon = "fa-comment",caption = "Número de comentarios", color = "#E0FFFF")
```
Row {data-height=500}
----------------------------------
<div style="height:500px">
### Número de publicaciones según el tipo
```{r}
data_post_type <- data %>% dplyr::group_by(post_type)
plot <- ggplot(data = data_post_type, aes(x = post_type, fill=post_type)) +
geom_bar() +
labs(x = "Tipo de publicación", y = "Nº publicaciones")
ggplotly(plot)
```
</div>
### Número de publicaciones por página y tipo
```{r}
data_by_page_type <- data %>% dplyr::group_by(page_name, post_type) %>%
dplyr::summarise(total = n())
# Small multiple
plot <- ggplot(data_by_page_type, aes(fill=post_type, y=total, x=page_name)) +
geom_bar(position="stack", stat="identity") +
xlab("Página") +
ylab("Nº publicaciones")
ggplotly(plot)
```
Row
----------------------------------
<div style="height:500px">
### Agrupación por año
```{r}
data_month <- data %>% dplyr::group_by(lubridate::month(date),lubridate::year(date)) %>%
dplyr::summarise(total = n()) %>% dplyr::arrange(desc(total))
colnames(data_month) <- c("month", "year", "total")
data_month$month <- as.character(data_month$month)
data_month$year <- as.character(data_month$year)
data_month$month <- factor(data_month$month, levels = c("1","2","3","4","5","6", "7","8","9", "10", "11", "12"), labels = c("01","02","03","04","05","06", "07","08","09", "10", "11", "12"))
data_month <- data_month[order(data_month$month),]
plot <- ggplot(data=data_month, mapping=aes(x=month, y=total, shape=year, color=year)) +
geom_point() +
geom_line(aes(group = 1)) +
facet_grid(facets = year ~ ., margins = FALSE) + theme_bw() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), plot.title = element_text(hjust = 0.5)) +
ggtitle("Número de publicaciones por mes y año") +
xlab("Mes") +
ylab("Nº publicaciones")
ggplotly(plot)
```
</div>
<div style="height:500px">
### Visión global
```{r}
plot <- ggplot(data=data_month, mapping=aes(x=month, y=total, shape=year, color=year)) +
geom_point() +
geom_line(aes(group = 1))+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), plot.title = element_text(hjust = 0.5))+
ggtitle("Número de publicaciones por mes y año") +
xlab("Mes") +
ylab("Nº publicaciones")
ggplotly(plot)
```
</div>
Row
----------------------------------
<div style="height:600px">
### Número de publicaciones por página y año
```{r}
data_by_page <- data %>% dplyr::group_by(page_name, lubridate::year(date)) %>%
dplyr::summarise(total = n()) %>% dplyr::arrange(desc(total))
colnames(data_by_page) <- c("page", "year", "total")
data_by_page <- data_by_page[!is.na(data_by_page$page),]
data_by_page$page <- as.character(data_by_page$page)
plot <- ggplot(data_by_page, aes(year, page)) +
geom_point(aes(size=total), colour = "red") +
scale_color_manual(values=c("black", "dodgerblue")) +
ggtitle("Publicaciones por página y año") +
theme(legend.position = 'none', plot.title = element_text(hjust = 0.5)) +
xlab("Año") +
ylab("Página")
ggplotly(plot)
```
</div>
Reacciones {data-icon="fa-comments"}
===
Row
--------------------------------------
<div style="height:100px">
### Descripción
En esta página han calculado el total de reacciones por tipo y página para ver cuáles son los patrones de los usuarios hacia esas páginas.
</div>
Row
-----------------------------------------------------------------------
```{r}
emojis <- c(#emoji("email"),# share
#emoji("thumbsup"),# like
emoji("heart"),# love
emoji("joy"),# haha
emoji("cry"),# sad
emoji("rage"),# angry
emoji("innocent")# thankfull
)
#names <- c("shares","likes","love","haha","sad","angry", "thankfull")
names <- c("love","haha","sad","angry", "thankfull")
```
<div style="height:400px">
### Reacciones BBC
```{r}
data_bbc <- tibble(names = names,
emoji = emojis,
total = c(#sum(data$shares_count[data$page_name=="bbc"]),
#sum(data$likes_count[data$page_name=="bbc"]),
sum(data$love_count[data$page_name=="bbc"]),
sum(data$haha_count[data$page_name=="bbc"]),
sum(data$sad_count[data$page_name=="bbc"]),
sum(data$angry_count[data$page_name=="bbc"]),
sum(data$thankful_count[data$page_name=="bbc"])
))
plot <- ggplot(data_bbc, aes(names, total, label = emoji, fill=names)) +
geom_bar(stat = "identity") +
geom_text(family = "EmojiOne", size = 6, vjust = -.5) +
scale_x_discrete(breaks = data_bbc$names, labels = data_bbc$emoji) +
ggtitle("Reacciones a las publicaciones de BBC") +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank()) +
xlab("Reacción")
ggplotly(plot)
```
</div>
### Reacciones CNN
```{r}
data_cnn <- tibble(names = names,
emoji = emojis,
total = c(#sum(data$shares_count[data$page_name=="cnn"]),
#sum(data$likes_count[data$page_name=="cnn"]),
sum(data$love_count[data$page_name=="cnn"]),
sum(data$haha_count[data$page_name=="cnn"]),
sum(data$sad_count[data$page_name=="cnn"]),
sum(data$angry_count[data$page_name=="cnn"]),
sum(data$thankful_count[data$page_name=="cnn"])
))
plot <- ggplot(data_cnn, aes(names, total, label = emoji, fill=names)) +
geom_bar(stat = "identity") +
geom_text(family = "EmojiOne", size = 6, vjust = -.5) +
scale_x_discrete(breaks = data_cnn$names, labels = data_cnn$emoji) +
ggtitle("Reacciones a las publicaciones de CNN") +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank()) +
xlab("Reacción")
ggplotly(plot)
```
### Reacciones ABC
```{r}
data_abc <- tibble(names = names,
emoji = emojis,
total = c(#sum(data$shares_count[data$page_name=="bbc"]),
#sum(data$likes_count[data$page_name=="bbc"]),
sum(data$love_count[data$page_name=="abc"]),
sum(data$haha_count[data$page_name=="abc"]),
sum(data$sad_count[data$page_name=="abc"]),
sum(data$angry_count[data$page_name=="abc"]),
sum(data$thankful_count[data$page_name=="abc"])
))
plot <- ggplot(data_abc, aes(names, total, label = emoji, fill=names)) +
geom_bar(stat = "identity") +
geom_text(family = "EmojiOne", size = 6, vjust = -.5) +
scale_x_discrete(breaks = data_abc$names, labels = data_abc$emoji) +
ggtitle("Reacciones a las publicaciones de ABC") +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank()) +
xlab("Reacción")
ggplotly(plot)
```
Row
-----------------------------------------------------------------------
<div style="height:400px">
### Reacciones The Fox
```{r}
data_fox <- tibble(names = names,
emoji = emojis,
total = c(#sum(data$shares_count[data$page_name=="fox"]),
#sum(data$likes_count[data$page_name=="fox"]),
sum(data$love_count[data$page_name=="fox"]),
sum(data$haha_count[data$page_name=="fox"]),
sum(data$sad_count[data$page_name=="fox"]),
sum(data$angry_count[data$page_name=="fox"]),
sum(data$thankful_count[data$page_name=="fox"])
))
plot <- ggplot(data_fox, aes(names, total, label = emoji, fill=names)) +
geom_bar(stat = "identity") +
geom_text(family = "EmojiOne", size = 6, vjust = -.5) +
scale_x_discrete(breaks = data_fox$names, labels = data_fox$emoji) +
ggtitle("Reacciones a las publicaciones de The Fox News Channel") +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank()) +
xlab("Reacción")
ggplotly(plot)
```
</div>
### Reacciones LA Times
```{r}
data_laTimes <- tibble(names = names,
emoji = emojis,
total = c(#sum(data$shares_count[data$page_name=="laTimes"]),
#sum(data$likes_count[data$page_name=="laTimes"]),
sum(data$love_count[data$page_name=="laTimes"]),
sum(data$haha_count[data$page_name=="laTimes"]),
sum(data$sad_count[data$page_name=="laTimes"]),
sum(data$angry_count[data$page_name=="laTimes"]),
sum(data$thankful_count[data$page_name=="laTimes"])
))
plot <- ggplot(data_laTimes, aes(names, total, label = emoji, fill=names)) +
geom_bar(stat = "identity") +
geom_text(family = "EmojiOne", size = 6, vjust = -.5) +
scale_x_discrete(breaks = data_laTimes$names, labels = data_laTimes$emoji) +
ggtitle("Reacciones a las publicaciones de Los Angeles Times") +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank()) +
xlab("Reacción")
ggplotly(plot)
```
Row
-----------------------------------------------------------------------
<div style="height:400px">
### Donut chart: Reacciones BBC
```{r, message=FALSE, warning=FALSE, results='hide'}
lessR::PieChart(x=names,
y =total,
data = data_bbc,
fill = "viridis",
hole = 0.5,
main = "BBC",
values_size=2.2,
labels_cex=2,
main_cex=2)
```
</div>
### Donut chart: Reacciones CNN
```{r, message=FALSE, warning=FALSE, results='hide'}
lessR::PieChart(x=names,
y =total,
data = data_cnn,
fill = "viridis",
hole = 0.5,
main = "CNN",
values_size=2.2,
labels_cex=2,
main_cex=2)
```
### Donut chart: Reacciones BBC
```{r, message=FALSE, warning=FALSE, results='hide'}
lessR::PieChart(x=names,
y =total,
data = data_bbc,
fill = "viridis",
hole = 0.5,
main = "BBC",
values_size=2.2,
labels_cex=2,
main_cex=2)
```
Row
-----------------------------------------------------------------------
<div style="height:400px">
### Donut chart: Reacciones The Fox
```{r, message=FALSE, warning=FALSE, results='hide'}
lessR::PieChart(x=names,
y =total,
data = data_fox,
fill = "viridis",
hole = 0.5,
main = "The Fox",
values_size=2.2,
labels_cex=2,
main_cex=2)
```
</div>
### Donut chart: Reacciones LA Times
```{r, message=FALSE, warning=FALSE, results='hide'}
lessR::PieChart(x=names,
y =total,
data = data_laTimes,
fill = "viridis",
hole = 0.5,
main = "LA Times",
values_size=2.2,
labels_cex=2,
main_cex=2)
```
Análisis de textos y sentimientos {data-icon="fa-user"}
===
Row
--------------------------------------
<div style="height:100px">
### Descripción
En esta página se ha realizado un **análisis del texto** de las publicaciones para comprender las **relaciones** entre las palabras y los **sentimientos** asociados.
</div>
Row {.tabset .tabset-fade}
--------
```{r}
get_bigram_filtered <- function(filter_word){
filtered_data <- data %>% dplyr::filter(stringr::str_detect(tolower(message), filter_word))
filtered_data <- filtered_data[filtered_data$post_type!="link" & filtered_data$post_type!="photo" ,]
filtered_data <- filtered_data[, c('page_name', 'name', 'message')]
data_bigram <- filtered_data %>% unnest_tokens(bigram, message, token = "ngrams", n = 2)
data_bigram <- data_bigram %>%
separate(bigram, c("word1", "word2"), sep = " ") %>%
filter(!word1 %in% stop_words$word,
!word2 %in% stop_words$word) %>%
count(page_name, word1, word2, sort = TRUE) %>%
unite("bigram", c(word1, word2), sep = " ") %>%
dplyr::filter(!stringr::str_detect(tolower(bigram), "http")) %>%
dplyr::filter(!stringr::str_detect(tolower(bigram), "bbc")) %>%
dplyr::filter(!stringr::str_detect(tolower(bigram), "fox")) %>%
dplyr::filter(!stringr::str_detect(tolower(bigram), "abc"))
data_bigram <- by(data_bigram, data_bigram["page_name"], head, n=8)
data_bigram <- Reduce(rbind, data_bigram)
data_bigram
}
```
<div style="height:600px">
### Basketball
```{r}
data_bigram_sport <- get_bigram_filtered("basketball")
```
#### Filtered word: Basketball
```{r}
plot1 <- data_bigram_sport %>%
mutate(page_name = factor(page_name) %>% forcats::fct_rev()) %>%
ggplot(aes(drlib::reorder_within(bigram, n, page_name), n, fill = page_name)) +
geom_bar(stat = "identity", alpha = .8, show.legend = FALSE) +
drlib::scale_x_reordered() +
facet_wrap(~ page_name, ncol = 2, scales = "free") +
coord_flip() +
ylab("") +
xlab("") +
ggtitle("Pares de palabras con mayor ocurrencia por página filtrando por: Basketbal") +
theme(legend.position = 'none', plot.title = element_text(hjust = 0.5))
ggplotly(plot1)
```
</div>
<div style="height:600px">
### Image
```{r}
data_bigram_image <- get_bigram_filtered("image")
```
#### Filtered word: Image
```{r}
plot1 <- data_bigram_image %>%
mutate(page_name = factor(page_name) %>% forcats::fct_rev()) %>%
ggplot(aes(drlib::reorder_within(bigram, n, page_name), n, fill = page_name)) +
geom_bar(stat = "identity", alpha = .8, show.legend = FALSE) +
drlib::scale_x_reordered() +
facet_wrap(~ page_name, ncol = 2, scales = "free") +
coord_flip() +
ylab("") +
xlab("") +
ggtitle("Pares de palabras con mayor ocurrencia por página filtrando por: Image") +
theme(legend.position = 'none', plot.title = element_text(hjust = 0.5))
ggplotly(plot1)
```
</div>
Row {.tabset .tabset-fade}
--------
```{r}
get_wordcloud_data <- function(filter_word){
filtered_data <- data %>% dplyr::filter(stringr::str_detect(message, filter_word))
text <- filtered_data$message
docs <- tm::Corpus(VectorSource(text))
docs <- docs %>%
tm::tm_map(tm::removeNumbers) %>%
tm::tm_map(tm::removePunctuation) %>%
tm::tm_map(tm::stripWhitespace)
docs <- tm::tm_map(docs, tm::content_transformer(tolower))
docs <- tm::tm_map(docs, tm::removeWords, tm::stopwords("english"))
dtm <- tm::TermDocumentMatrix(docs)
matrix <- as.matrix(dtm)
words <- sort(rowSums(matrix),decreasing=TRUE)
df <- data.frame(word = names(words),freq=words)
df <- df %>% mutate_at(vars(word), function(x){gsub('[^ -~]', '', x)})
df <- df[1:500, ]
}
```
```{r}
data_wordcloud <- get_wordcloud_data("sport")
```
<div style="height:600px">
### Normal
#### Wordcloud normal
```{r message=FALSE}
wordcloud2(data=data_wordcloud, size=1.6, color='random-dark')
```
</div>
<div style="height:600px">
### Color
#### Wordcloud cambiando el color
```{r message=FALSE}
wordcloud2(data=data_wordcloud, size=1.6, color='random-light', backgroundColor="black")
```
</div >
<div style="height:600px; display: flex; justify-content: center;">
### Forma
#### Wordcloud cambiando la forma
```{r message=FALSE}
wordcloud2(data_wordcloud, size = 0.7, shape = 'star')
```
</div>
Row
------
<div style="height:600px">
### Análisis de sentimientos (AFINN)
```{r}
data_bigram_all <- data[, c('page_name', 'name', 'message')]
data_bigram_all <- data_bigram_all %>% unnest_tokens(bigram, message, token = "ngrams", n = 2)
AFINN <- get_sentiments("afinn")
nots <- data_bigram_all %>%
separate(bigram, c("word1", "word2"), sep = " ") %>%
filter(word1 == "not") %>%
inner_join(AFINN, by = c(word2 = "word")) %>%
count(word2, value, sort = TRUE)
plot <- nots %>%
mutate(contribution = n * value) %>%
arrange(desc(abs(contribution))) %>%
head(20) %>%
ggplot(aes(reorder(word2, contribution), n * value, fill = n * value > 0)) +
geom_bar(stat = "identity", show.legend = FALSE) +
xlab("Palabras precedidas por 'not'") +
ylab("Sentimiento según el número de ocurrencia") +
coord_flip() +
theme(legend.position = 'none')
ggplotly(plot)
```
</div>
Row
-----
<div style="height:800px">
### Red de correlaciones para el filtro "trump"
```{r}
filtered_data <- data %>% dplyr::filter(stringr::str_detect(tolower(message), "trump"))
filtered_data <- filtered_data[, c('page_name', 'name', 'message')]
data_bigram <- filtered_data %>% unnest_tokens(bigram, message, token = "ngrams", n = 2)
data_bigram <- data_bigram %>%
dplyr::filter(!stringr::str_detect(tolower(bigram), "http")) %>%
dplyr::filter(!stringr::str_detect(tolower(bigram), "bbc")) %>%
dplyr::filter(!stringr::str_detect(tolower(bigram), "fox"))
bigram_graph <- data_bigram %>%
separate(bigram, c("word1", "word2"), sep = " ") %>%
filter(!word1 %in% stop_words$word,
!word2 %in% stop_words$word) %>%
count(word1, word2, sort = TRUE) %>%
unite("bigram", c(word1, word2), sep = " ") %>%
filter(n > 50) %>%
graph_from_data_frame()
```
```{r}
set.seed(123)
a <- grid::arrow(type = "closed", length = unit(.15, "inches"))
plot <- ggraph(bigram_graph, layout = "fr") +
geom_edge_link() +
geom_node_point(color = vcount(bigram_graph) , size = 5) +
geom_node_text(aes(label = name), vjust = 1, hjust = 1) +
theme_void()
girafe(ggobj = plot, width_svg = 10, height_svg = 10,
options = list(opts_sizing(rescale = TRUE, width = .6)))
```
</div>